

WHAT IS CLAIMED IS:

1           1. A processing core comprising:

2           one or more processing pipelines having a total of N-number of processing

3           paths, each of said processing paths for processing instructions on M-bit data words; and

4           a plurality of register files, each having Q-number of registers, said Q-number

5           of registers being M-bits wide;

6           wherein said Q-number of registers within each of said plurality of register

7           files are either private or global registers, and wherein when a value is written to one of said

8           Q-number of said registers which is a global register within one of said plurality of register

9           files, said value is propagated to a corresponding global register in the other of said plurality

10           of register files, and wherein when a value is written to one of said Q-number of said registers

11           which is a private register within one of said plurality of register files, said value is not

12           propagated to a corresponding register in the other of said plurality of register files.

1           2. The processing core as recited in claim 1, wherein every two of said N-

2           number of processing paths share one of said plurality of register files.

1           3. The processing core as recited in claim 1, wherein a processing

2           instruction comprises N-number of P-bit instructions appended together to form a very long

3           instruction word (VLIW), and said N-number of processing paths process N-number of P-bit

4           instructions in parallel.

1           4. The processor chip as recited in claim 3, wherein M=64, Q=64, and

2           P=32.

1           5. The processing core as recited in claim 1, wherein said processing

2           pipeline comprises an execute stage which includes an execute unit for each of said N-

3           number of M-bit processing paths, each of said execute units comprising an integer

4           processing unit, a load/store processing unit, a floating point processing unit, or any

5           combination of one or more of said integer processing units, said load/store processing units,

6           and said floating point processing units.

1           6. The processing core as recited in claim 5, wherein an integer

2           processing unit and a floating point processing unit share one of said plurality of register

3           files.

1                   7.       The processing core as recited in claim 1, wherein Q=64, and a 64-bit  
2       special register stores bits indicating whether a register in a register file is a private register or  
3       a global register, each bit in the 64-bit special register corresponding to one of said registers  
4       in said register file.

1                   8.       The processing core as recited in claim 1, wherein each of said  
2       plurality of register files is connected to a bus, and a value written to a global register in one  
3       of said plurality of register files is propagated to a corresponding global register in the other  
4       of said plurality of register files across said bus.

1                   9.       The processing core as recited in claim 1, wherein said plurality of  
2       register files are connected together in serial, and a value written to a first global register in a  
3       first of said plurality of register files is propagated to a corresponding first global register in a  
4       second of said plurality of register files connected directly to said first of said plurality of  
5       register files.

1                   10.      A VLIW processing core comprising:  
2                       one or more processing pipelines each including a fetch stage, a decode stage,  
3                       an execute stage, and a write-back stage, said execute stage having an execute unit  
4                       comprising an integer processing unit, a load/store processing unit, a floating point  
5                       processing unit, or any combination of one or more of said integer processing units, said  
6                       load/store processing units, and said floating point processing units; and  
7                       a register file for each of said one or more processing pipelines;  
8                       wherein an integer processing unit and a floating point processing unit within  
9       said one or more processing pipelines both access said register file.

1                   11.      In a computer system, a scalable computer processing architecture,  
2       comprising:  
3                       one or more processor chips, each comprising:  
4                       a processing core, including:  
5                       a processing pipeline having N-number of processing paths, each of said  
6       processing paths for processing instructions on M-bit data words; and  
7                       a plurality of register files, each having Q-number of registers, said Q-number  
8       of registers being M-bits wide;

1                           15. The computer processing architecture as recited in claim 11, wherein  
2 said processing pipeline comprises an execute stage which includes an execute unit for each  
3 of said N-number of M-bit processing paths, each of said execute units comprising an integer  
4 processing unit, a load/store processing unit, a floating point processing unit, or any  
5 combination of one or more of said integer processing units, said load/store processing units,  
6 and said floating point processing units.

1                           16.    The computer processing architecture as recited in claim 15, wherein  
2    an integer processing unit and a floating point processing unit share one of said plurality of  
3    register files.

1                   17. The computer processing architecture as recited in claim 11, wherein  
2    said Q-number of registers within each of said plurality of register files are either private or  
3    global registers, and wherein when a value is written to one of said Q-number of said  
4    registers which is a global register within one of said plurality of register files, said value is  
5    propagated to a corresponding global register in the other of said plurality of register files,  
6    and wherein when a value is written to one of said Q-number of said registers which is a  
7    private register within one of said plurality of register files, said value is not propagated to a  
8    corresponding register in the other of said plurality of register files.

1                   18. The computer processing architecture as recited in claim 17, wherein  
2    Q=64, and a 64-bit special register stores bits indicating whether a register in a register file is  
3    a private register or a global register, each bit in the 64-bit special register corresponding to  
4    one of said registers in said register file.

1                   19. The computer processing architecture as recited in claim 17, wherein  
2    each of said plurality of register files is connected to a bus, and a value written to a global  
3    register in one of said plurality of register files is propagated to a corresponding global  
4    register in the other of said plurality of register files across said bus.

1                   20. The computer processing architecture as recited in claim 19, wherein  
2    said plurality of register files are connected together in serial, and a value written to a first  
3    global register in a first of said plurality of register files is propagated to a corresponding first  
4    global register in a second of said plurality of register files connected directly to said first of  
5    said plurality of register files.